Task 1 – Curse of dimensionality and effect of sample size

For D from 1 to 15 dimensions, simulate 1000 random D-dimensional points, where the value in each dimension is uniformly randomly distributed between -1 and +1.

a)

Calculate the fraction of these points that are within distance 1 of the origin, giving an approximation of the volume of the unit hypersphere to the hypercube inscribing it. Plot this fraction as a function of D (a scatter plot of D versus the fraction).

b)

Use the value of this fraction at D = 2 and D = 3 to get estimates for the value of pi (Π) as you know the area (for D = 2) and volume (for D = 3) formulae for these cases.

Area of unit square = (2r)^2 = 4

Area of unit circle = pi*r^2 = pi

fraction = pi/4 --> pi = 4*fraction

Volume of unit cube = (2r)^3 = 8

Volume of unit sphere = 4/3pir^3 = 4/3*pi

fraction = 4/3pi/8 --> pi = 6fraction

c)

Perform the calculations in part (b) with larger sample sizes. You can use the following set: {5000, 10000, 25000, 50000, 100000}. Visualize the estimated Π for D = 2 and D = 3 cases. Comment on your results

Comment on your results

d)

Repeat this simulation, sampling 1000 D-dimensional points from 1 to 15 dimensions, where the value in each dimension is uniformly randomly distributed between -1 and +1. For each value of D, generate an additional 100 test instances and calculate the distance to each test instance’s nearest neighbor. Plot the average distance from the test instances to their nearest neighbors as a function of D.

The average distance from the test instances to their nearest neighbors increases as dimension increases

Task 2 – Practicing data manipulation skills on images

a)

Read image as a variable in R/Python. You need to install “jpeg” package to read image into a variable if you use R. For Python, an alternative is to use matplotlib package. What is the structure of the variable that stores the image? What is the dimension? a. Display the image. (Hint: google “rasterImage”)

b)

Display each channel as separate image

c)

For each channel, take the average of the columns and plot the average as a line plot for each channel on a single plot.

d)

For each channel, subtract one half of the image from the other half (choice of halves is up to you but dividing the head image vertically into two parts make more sense). If you observe negative pixel values, you can make them equal to zero. Then:

e)

In order to create a noisy image, add a random noise from uniform distribution with minimum value of 0 and a maximum value of “0.1 * maximum pixel value observed” to each pixel value for each channel of original image. • Display the new image. • Display each channel separately as separate image.